Cellen 2

Data visualisation — making a plot with ggplot2

Gavin Simpson

Aarhus University

2025-02-24

The anatomy of a plot

Visualisation involves representing data by lines, shapes, colours, etc.

Map data to visual channels — some channels more effective than others

ggplot provides a set of tools to

  • map data to visual elements,
  • specify the kind of plot, and
  • control the fine details of the final plot

The anatomy of a ggplot

A ggplot comprises several main elements

  1. Data
  2. Aesthetic mappings
  3. Geoms
  4. Co-ordinates & scales
  5. Labels & guides

Getting set up

Load some packages that we need for plotting and working with data

library("ggplot2")
library("dplyr")

ggplot()

Main function is ggplot()

  • specify data, the data frame containing the data
  • specify mappings of variables in data to aesthetics with aes()

Add layers to plot vis +

Geoms are the main layer-types we add to influence the plot

Geoms by default inherit the data and aesthetics from the ggplot() call

ggplot(data_frame, aes(x = var1, y = var2, colour = var3)) +
    geom_<type>(....) +
    geom_<type>(....)

Data

Two main ways in which data tend to be recorded

  1. wide-format
  2. long-format

In long-format:

  • every column is a variable
  • every row is an observation

In wide-format

  • some variables are spread out over multiple columns

ggplot requires data in long form

palmerpenguins data set

library("palmerpenguins")               # load package
penguins                                # print data frame
# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           NA            NA                  NA          NA
 5 Adelie  Torgersen           36.7          19.3               193        3450
 6 Adelie  Torgersen           39.3          20.6               190        3650
 7 Adelie  Torgersen           38.9          17.8               181        3625
 8 Adelie  Torgersen           39.2          19.6               195        4675
 9 Adelie  Torgersen           34.1          18.1               193        3475
10 Adelie  Torgersen           42            20.2               190        4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>

Our first plot

Say we want to plot flipper length (flipper_length_mm) against bill length (bill_length_mm)

p <- ggplot(data = penguins)

We tell ggplot() where to look for variables, but haven”t specified any mappings yet

Assigned the ouput of the ggplot() call to the object p (could call p anything)

Alt + -

or

Option + -

types the assignment operator <-

Our first plot

We specify mappings between variables and aesthetics via the mapping argument

Use the aes() function to specify the mappings

p <- ggplot(data = penguins,
            mapping = aes(x = flipper_length_mm, y = bill_length_mm))

This sets up a mapping between our two variables and the x and y aesthetics

The x and y aesthetics are the \(x\) and \(y\) coordinates of the plot

Our first plot

We can draw the plot by print()ing the object p

What do you think you”ll get if you print p?

p

Our first plot

Only the scale for the x and y aesthetics is drawn

p

Adding a layer

Need to tell ggplot() how we want the data drawn

Need to choose a geometric object or geom

geoms are functions with names geom_<type>()

A geom adds a layer to an existing plot

For a scatterplot, we represent the \(x\), \(y\) pairs via points geom_point()

Adding a layer

p + geom_point()

Putting it all together

ggplot(data = penguins,
       mapping = aes(x = flipper_length_mm, y = bill_length_mm)) +
    geom_point()

Solution

ggplot(data = penguins,
       mapping = aes(x = bill_depth_mm, y = body_mass_g)) +
    geom_point()

geoms don”t always draw the data

p  <- ggplot(data = penguins, mapping = aes(x = flipper_length_mm, y = bill_length_mm))
p + geom_smooth()

geom_smooth() adds a smoother

Here we see the effect of a statistical summary associated with a geom

p  <- ggplot(data = penguins, mapping = aes(x = flipper_length_mm, y = bill_length_mm))
p + geom_smooth()

Plots with multiple layers

ggplot(data = penguins, mapping = aes(x = flipper_length_mm, y = bill_length_mm)) +
  geom_point() + geom_smooth()

Geoms inherit data and mappings

Didn”t need to tell each geom what data or mappings to use

Information is inherited from the main ggplot() object

Can override this

ggplot(data = penguins) + 
  geom_point(aes(x = flipper_length_mm, y = bill_length_mm)) +
  geom_smooth(aes(x = flipper_length_mm, y = bill_length_mm))

Mapping vs setting aesthetics

ggplot(data = penguins,
    mapping = aes(x = flipper_length_mm, y = bill_length_mm, colour = species)) +
  geom_point()

Solution

ggplot(data = penguins,
       mapping = aes(x = bill_depth_mm, y = body_mass_g, colour = sex)) +
    geom_point()

Setting aesthetics — the wrong way

ggplot(data = penguins,
       mapping = aes(x = flipper_length_mm, y = bill_length_mm, colour = "purple")) + 
  geom_point()

Setting aesthetics — the right way

Mappings are in aes(), settings go outside aes()

ggplot(data = penguins, mapping = aes(x = flipper_length_mm, y = bill_length_mm)) + 
  geom_point(colour = "purple")

Setting aesthetics — the right way

Mappings are inside aes(), settings go outside aes()

alpha controls transparency, size controls how big things are

ggplot(
  penguins,
  aes(
    x = flipper_length_mm,
    y = bill_length_mm
    )
  ) +
  geom_point(alpha = 0.3) +
  geom_smooth(
    method = "lm",
    colour = "orange",
    se = FALSE,
    size = 2
  )

labs()

ggplot(
  penguins,
  aes(
    x = flipper_length_mm,
    y = bill_length_mm
    )
  ) +
  geom_point(alpha = 0.3) +
  geom_smooth(
    method = "lm",
    colour = "orange",
    se = FALSE,
    size = 2
  ) +
  labs(
    x = "Flipper length (mm)",
    y = "Bill length (mm)",
    title = "How big are penguins anyway?",
    subtitle = "Data points are individual penguins",
    caption = "Source: palmerpenguins")

labs() — setting plot labels

Reusing elements

You can save time and effort by reusing plot elements

my_labs <- labs(
    x = "Flipper length (mm)",
    y = "Bill length (mm)",
    title = "How big are penguins anyway?",
    subtitle = "Data points are individual penguins",
    caption = "Source: palmerpenguins")

Then resuse

p + geom_point() + geom_smooth() +
    scale_x_log10() + my_labs

Matching aesthetics

ggplot(penguins,
    aes(x = flipper_length_mm, y = bill_length_mm,
        colour = species, fill = species)) +
  geom_point() +
  geom_smooth(method = "lm") +
  my_labs

Mapping aesthetics per geom

ggplot(penguins,
    aes(x = flipper_length_mm, y = bill_length_mm,)) +
  geom_point(aes(colour = species)) +
  geom_smooth(method = "lm") +
  my_labs

Mapping continuous variables to other aesthetics

ggplot(penguins,
    aes(x = flipper_length_mm, y = bill_length_mm,)) +
  geom_point(aes(colour = body_mass_g)) +
  my_labs

Mapping continuous variables to other aesthetics

ggplot(penguins,
    aes(x = flipper_length_mm, y = bill_length_mm,)) +
  geom_point(aes(colour = log10(body_mass_g))) +
  my_labs

ggsave() — Saving your work

Plots can be rendered to disk in a range of formats — PNG, PDF, …

Type of file depends on the extension given in filename

ggsave() saves the last ggplot object plotted

ggplot(penguins,
    aes(x = flipper_length_mm, y = bill_length_mm,)) +
  geom_point(aes(colour = log10(body_mass_g))) +
  my_labs
## save the last plot
ggsave("my-plot.png")

ggsave() — Saving your work

Plots can be rendered to disk in a range of formats — PNG, PDF, …

Type of file depends on the extension given in filename

ggsave() saves a specific ggplot object if given one

my_plt <- ggplot(penguins,
    aes(x = flipper_length_mm, y = bill_length_mm,)) +
  geom_point(aes(colour = log10(body_mass_g))) +
  my_labs

## save a specific plot object
ggsave('my-plot.pdf', plot = my_plt)

ggsave() — Specifying size

ggsave() always saves objects in inches & takes the size from the device if not specified

Can set width and height to numeric values and select the units via units

my_plt <- ggplot(penguins,
    aes(x = flipper_length_mm, y = bill_length_mm,)) +
  geom_point(aes(colour = log10(body_mass_g))) +
  my_labs

## save a specific plot object
ggsave('my-plot-cm.pdf', plot = my_plt, height = 10, width= 20, units = 'cm')